NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ConfuGuard: Using Metadata to Detect Active and Stealthy Package Confusion Attacks Accurately and at Scale

Jiang, Wenxin; Çakar, Berk; Lysenko, Mikola; Davis, James C (August 2025, International Conference on Software Engineering (ICSE) 2026)

Package confusion attacks such as typosquatting threaten soft- ware supply chains. Attackers make packages with names that syntactically or semantically resemble legitimate ones, trick- ing engineers into installing malware. While prior work has developed defenses against package confusions in some soft- ware package registries, notably NPM, PyPI, and RubyGems, gaps remain: high false-positive rates, generalization to more software package ecosystems, and insights from real-world deployment. In this work, we introduce ConfuGuard, a state-of-art de- tector for package confusion threats. We begin by presenting the first empirical analysis of benign signals derived from prior package confusion data, uncovering their threat patterns, engineering practices, and measurable attributes. Advancing existing detectors, we leverage package metadata to distin- guish benign packages, and extend support from three up to seven software package registries. Our approach significantly reduces false positive rates (from 80% to 28%), at the cost of an additional 14s average latency to filter out benign pack- ages by analyzing the package metadata. ConfuGuard is used in production at our industry partner, whose analysts have already confirmed 630 real attacks detected by ConfuGuard
more » « less
Free, publicly-accessible full text available August 1, 2026
SoK: A Literature and Engineering Review of Regular Expression Denial of Service (ReDoS)

Bhuiyan, Masudul; Cakar, Berk; Burmans, Ethan H; Davis, James C; Staicu, Cristian-Alexandru (August 2025, The 20th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2025))

Regular Expression Denial of Service (ReDoS) is a vulnerability class that has become prominent in recent years. Attackers can weaponize such weaknesses as part of asymmetric cyberattacks that exploit the slow worst-case matching time of regular expres- sion (regex) engines. In the past, problematic regular expressions have led to outages at Cloudflare and Stack Overflow, showing the severity of the problem. While ReDoS has drawn significant research attention, there has been no systematization of knowledge to delineate the state of the art and identify opportunities for fur- ther research. In this paper, we describe the existing knowledge on ReDoS. We first provide a systematic literature review, discussing approaches for detecting, preventing, and mitigating ReDoS vul- nerabilities. Then, our engineering review surveys the latest regex engines to examine whether and how ReDoS defenses have been re- alized. Combining our findings, we observe that (1) in the literature, almost no studies evaluate whether and how ReDoS vulnerabilities can be weaponized against real systems, making it difficult to assess their real-world impact; and (2) from an engineering view, many mainstream regex engines now have ReDoS defenses, rendering many threat models obsolete. We conclude with an extensive dis- cussion, highlighting avenues for future work. The open challenges in ReDoS research are to evaluate emerging defenses and support engineers in migrating to defended engines. We also highlight the parallel between performance bugs and asymmetric DoS, and we argue that future work should capitalize more on this similarity and adopt a more systematic view on ReDoS-like vulnerabilities.
more » « less
Free, publicly-accessible full text available August 25, 2026
Detecting Music Performance Errors with Transformers

https://doi.org/10.1609/aaai.v39i22.34539

Chou, Benjamin Shiue_Hal; Jajal, Purvish; Eliopoulos, Nicholas John; Nadolsky, Tim; Yang, Cheng_Yun; Ravi, Nikita; Davis, James C; Yun, Kristen Yeon_Ji; Lu, Yung_Hsiang (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Beginner musicians often struggle to identify specific errors in their performances, such as playing incorrect notes or rhythms. There are two limitations in existing tools for music error detection: (1) Existing approaches rely on automatic alignment; therefore, they are prone to errors caused by small deviations between alignment targets; (2) There is insufficient data to train music error detection models, resulting in over-reliance on heuristics. To address (1), we propose a novel transformer model, Polytune, that takes audio inputs and outputs annotated music scores. This model can be trained end-to-end to implicitly align and compare performance audio with music scores through latent space representations. To address (2), we present a novel data generation technique capable of creating large-scale synthetic music error datasets. Our approach achieves a 64.1% average Error Detection F1 score, improving upon prior work by 40 percentage points across 14 instruments. Additionally, our model can handle multiple instruments compared with existing transcription methods repurposed for music error detection.
more » « less
Free, publicly-accessible full text available April 11, 2026
Token Turing Machines are Efficient Vision Models

https://doi.org/10.48550/arXiv.2409.07613

Jajal, Purvish; Eliopoulos, Nick John; Chou, Benjamin Shiue-Hal; Thiruvathukal, George K; Davis, James C; Lu, Yung-Hsiang (February 2025, The Computer Vision Foundation.)

We propose Vision Token Turing Machines (ViTTM), an efficient, low-latency, memory-augmented Vision Transformer (ViT). Our approach builds on Neural Turing Machines and Token Turing Machines, which were applied to NLP and sequential visual understanding tasks. ViTTMs are designed for non-sequential computer vision tasks such as image classification and segmentation. Our model creates two sets of tokens: process tokens and memory tokens; process tokens pass through encoder blocks and read-write from memory tokens at each encoder block in the network, allowing them to store and retrieve information from memory. By ensuring that there are fewer process tokens than memory tokens, we are able to reduce the inference time of the network while maintaining its accuracy. On ImageNet-1K, the state-of-the-art ViT-B has median latency of 529.5ms and 81.0% accuracy, while our ViTTM-B is 56% faster (234.1ms), with 2.4 times fewer FLOPs, with an accuracy of 82.9%. On ADE20K semantic segmentation, ViT-B achieves 45.65mIoU at 13.8 frame-per-second (FPS) whereas our ViTTM-B model acheives a 45.17 mIoU with 26.8 FPS (+94%).
more » « less
Free, publicly-accessible full text available February 28, 2026
Finding 709 Defects in 258 Projects: An Experience Report on Applying CodeQL to Open-Source Embedded Software (Experience Paper)

https://doi.org/10.5281/zenodo.15200316

Shen, Mingjie; Pillai, Akul Abhilash; Yuan, Brian A; Davis, James C; Machiry, Aravind (January 2025, Zenodo)

This artifact contains the GitHub workflows to run CodeQL on EMBOSS repositories in our dataset, the results of running CodeQL on these repositories, and our manual analysis of CodeQL results.
more » « less
Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge

https://doi.org/10.1109/WACV61041.2025.00695

Eliopoulos, Nicholas John; Jajal, Purvish; Davis, James C; Liu, Gaowen; Thiravathukal, George K; Lu, Yung-Hsiang (February 2025, IEEE)

Free, publicly-accessible full text available February 26, 2026
$$ZTD_{\text{JAVA}}$$: Mitigating Software Supply Chain Vulnerabilities via Zero-Trust Dependencies

https://doi.org/10.1109/ICSE55347.2025.00148

Amusuo, Paschal C; Robinson, Kyle A; Singla, Tanmay; Peng, Huiyun; Machiry, Aravind; Torres-Arias, Santiago; Simon, Laurent; Davis, James C (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
Token Turing Machines are Efficient Vision Models

https://doi.org/10.1109/WACV61041.2025.00767

Jajal, Purvish; Eliopoulos, Nick John; Chou, Benjamin Shiue-Hal; Thiravathukal, George K; Davis, James C; Lu, Yung-Hsiang (February 2025, IEEE)

Free, publicly-accessible full text available February 26, 2026
Signing in Four Public Software Package Registries: Quantity, Quality, and Influencing Factors

https://doi.org/10.1109/SP54263.2024.00215

Schorlemmer, Taylor R; Kalu, Kelechi G; Chigges, Luke; Ko, Kyung Myung; Ishgair, Eman Abu; Bagchi, Saurabh; Torres-Arias, Santiago; Davis, James C (May 2024, Proceedings of the IEEE Symposium on Security and Privacy)

Full Text Available
Reflecting on the Use of the Policy-Process-Product Theory in Empirical Software Engineering

https://doi.org/10.1145/3611643.3613075

Kalu, Kelechi G; Schorlemmer, Taylor R; Chen, Sophie; Robinson, Kyle A; Kocinare, Erik; Davis, James C (November 2023, ACM)

The primary theory of software engineering is that an organiza- tion’s Policies and Processes influence the quality of its Products. We call this the PPP Theory. Although empirical software engineer- ing research has grown common, it is unclear whether researchers are trying to evaluate the PPP Theory. To assess this, we analyzed half (33) of the empirical works published over the last two years in three prominent software engineering conferences. In this sample, 70% focus on policies/processes or products, not both. Only 33% provided measurements relating policy/process and products. We make four recommendations: (1) Use PPP Theory in study design; (2) Study feedback relationships; (3) Diversify the studied feed- forward relationships; and (4) Disentangle policy and process. Let us remember that research results are in the context of, and with respect to, the relationship between software products, processes, and policies.
more » « less
Full Text Available

« Prev Next »

Search for: All records